Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

نویسندگان

  • Noemí Pérez-Díaz
  • David Ruano-Ordás
  • José Ramon Méndez
  • Juan F. Gálvez
  • Florentino Fernández Riverola
چکیده

Nowadays, spam represents an extensive subset of the information delivered through Internet involving all unsolicited and disturbing communications received while using different services including e-mail, weblogs and forums. In this context, this paper reviews and brings together previous approaches and novel alternatives for applying rough set (RS) theory to the spam filtering domain by defining three different rule execution schemes: MFD (most frequent decision), LNO (largest number of objects) and LTS (largest total strength). With the goal of correctly assessing the suitability of the proposed algorithms, pam classification ough sets ule execution schemes ontent-based techniques odel evaluation we specifically address and analyse significant questions for appropriate model validation like corpus selection, preprocessing and representational issues, as well as different specific benchmarking measures. From the experiments carried out using several execution schemes for selecting appropriate decision rules generated by rough sets, we conclude that the proposed algorithms can outperform other wellknown anti-spam filtering techniques such as support vector machines (SVM), Adaboost and different types of Bayes classifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rough sets theory in site selection decision making for water reservoirs

Rough Sets theory is a mathematical approach for analysis of a vague description of objects presented by a well-known mathematician, Pawlak (1982, 1991). This paper explores the use of Rough Sets theory in site location investigation of buried concrete water reservoirs. Making an appropriate decision in site location can always avoid unnecessary expensive costs which is very important in constr...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

Voting-based Classification for E-mail Spam Detection

The problem of spam e-mail has gained a tremendous amount of attention. Although entities tend to use e-mail spam filter applications to filter out received spam e-mails, marketing companies still tend to send unsolicited emails in bulk and users still receive a reasonable amount of spam e-mail despite those filtering applications. This work proposes a new method for classifying emails into spa...

متن کامل

Improving spam mail filtering using classification algorithms with discretization Filter

1 ME Computer Student, 2 Professor 1,2 Computer Department, Pimpri Chinchwad College of Engineering, Nigdi, Pune, Maharashtra, INDIA _________________________________________________________________________ Abstract: Email spam or junk e-mail is one of the major problems of the today's usage of Internet, which carries financial damage to organizations and annoying individual users. Among the ap...

متن کامل

Bayesian Spam Filtering Mechanism Based on Decision Tree of Attribute Set Dependence in the MapReduce Framework

Bayesian spam filtering is a classification method based on the theory of probability and statistics, and the Bayesian spam filtering based on Mapreduce can solve the defect of the traditional Bayesian spam filtering that consumes large amounts of system resources and network resources when the mail set is pre-training. It needs to classify mails manually in the pre-training phase of mail set, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Appl. Soft Comput.

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2012